Octopus: A Secure and Anonymous DHT Lookup 



(N 

o 



(N 



u 

o 



> 

00 

(N 

rn 

O 



X 



Qiyan Wang 

Department of Computer Science 

University of Illinois at Urbana-Champaign 

IL, U.S. A 

qwang26 @ Illinois, edii 



Abstract — Distributed Hash Table (DHT) lookup Is a core 
technique in structured peer-to-peer (P2P) networks. Its decen- 
tralized nature introduces security and privacy vulnerabilities 
for applications built on top of them; we thus set out to design 
a lookup mechanism achieving both security and anonymity, 
heretofore an open problem. We present Octopus, a novel DHT 
lookup which provides strong guarantees for both security and 
anonymity. Octopus uses attacker identification mechanisms 
to discover and remove malicious nodes, severely limiting an 
adversary's ability to carry out active attacks, and splits lookup 
queries over separate anonymous paths and introduces dummy 
queries to achieve high levels of anonymity. We analyze the 
security of Octopus by developing an event-based simulator 
to show that the attacker discovery mechanisms can rapidly 
identify malicious nodes with low error rate. We calculate the 
anonymity of Octopus using probabilistic modeling and show 
that Octopus can achieve near-optimal anonymity. We evaluate 
Octopus's efficiency on Planetlab with 207 nodes and show 
that Octopus has reasonable lookup latency and manageable 
communication overhead. 

Keywords-Anonymity, security, DHT, lookup 

I. Introduction 

Structured peer-to-peer networks, such as Chord [H] or 
KademHa |3, allow the creation of scalable distributed 
applications that can support millions of users. They have 
been used to build a number of successful applications, 
including P2P file sharing (Ovemet, Kad, Vuze DHTIJ) and 
content distribution (CoralCDN ||3]), and many others have 
been proposed, such as distributed file systems 1|4], anony- 
mous communication systems ||5l-|l9l, and online social 
networks ifTOl . ifTTl . At the heart of these networks lies a 
distributed hash table (DHT) lookup mechanism that imple- 
ments a decentralized key-value store. The DHT allows effi- 
cient storage and coordination among very large collections 
of nodes; however, its decentralized nature creates a number 
of security and privacy vulnerabilities. Because peers have 
to rely on other peers to determine the state of the network, 
malicious nodes could provide misinformation to misdirect 
an honest user's lookups lfT2l . Likewise, nodes can profile 
the lookup activities of other nodes and learn what files or 
websites they are interested in or who their friends may 
be. Recent research shows that anonymous communication 

*|http://www.vuze.com/| 



Nikita Borisov 

Department of Electrical and Computer Engineering 

University of Illinois at Urbana-Champaign 

IL, U.S.A. 

nikita @ Illinois, edu 



systems based on anonymity-deficient DHT lookups have 
severe vulnerabilities to information leak attacks llTSl . lfT4l . 

To address these issues, our goal is to design a lookup 
mechanism that achieves both security and anonymity 
(where anonymity means no information is revealed about 
which nodes are looking up which values), heretofore an 
open problem. We note that security is a necessary condition 
for anonymity, because without security, malicious nodes 
can misdirect the lookup towards colluding nodes and learn 
about the lookup target. On the other hand, security is not 
a sufficient condition for anonymity. Some existing lookup 
schemes designed to resist active attacks are not suitable to 
build anonymous DHT systems, e.g., due to heavily relying 
on redundant transmission that leaks information about the 
lookup initiator and/or target llTSI - lfTT) . Furthermore, even 
with a secure lookup that itself does not cause information 
leak, an inappropriate design of the DHT system can still 
lead to anonymity vulnerabilities lfT4l . 

Our contributions in this work include; 

1) We propose a suite of novel security mechanisms- 
attacker discovery for DHT systems. Our mechanisms proac- 
tively identify and remove malicious peers. We develop an 
event-based simulator to show that our identification mecha- 
nism is capable of rapidly discovering malicious nodes with 
low error rate. For a network with 20% malicious nodes, it 
can correctly identify all attacking nodes within 30 minutes. 
We compare our scheme with Halo lfT6l . a state-of-the-art 
secure DHT system, and show that our scheme provides 
better robustness against active attacks. Furthermore, our 
design does not rely on redundant transmission, and thus 
is suitable to construct anonymous DHT systems. 

2) With the proposed security mechanisms, we put 
forward a secure and anonymous DHT lookup Octopus. 
Octopus splits individual queries used in a lookup over 
multiple anonymous paths, and introduces dummy queries, 
to make it difficult for an adversary to learn the eventual 
target of a lookup. Unlike most previous works that only 
analytically evaluate the systems' anonymity, we use proba- 
bilistic modelling with the help of simulation to calculate 
the information leak, so that users can know how much 
anonymity can be actually provided by the system. We show 
that Octopus provides near-optimal anonymity for both the 



lookup initiator and target. In a network of 100000 nodes 
with 20% malicious nodes. Octopus only leaks 0.57 bit of 
information about the initiator and 0.82 bit of information 
about the target; these are 6 times and 4 times better 
than what previous works Q, ||8] were able to achieve, 
respectively. 

3) For performance evaluation, we measure the lookup la- 
tency of Octopus on Planetlab with 207 nodes, and compare 
it with the base-line scheme Chord [jj and Halo |[T6l . The 
lookup latency of Octopus is comparable to that of Chord, 
and even better than that of Halo. While Octopus incurs 
relatively higher communication overhead than Chord and 
Halo to provide extra security and/or anonymity guarantees, 
the bandwidth consumption of Octopus is still manageable, 
which is only a few kbps for each node. 

The remainder paper is organized as follows. Section |II] 
presents the system model. We describe our security and 
anonymity mechanisms in Section |lll] and Section IIVI The 
efficiency evaluation is provided in Section|V]and Section lVTl 
presents the related work. We conclude in Section IVIII 

II. System Model 

A. Threat Model 

In the same vein of related works |l5|-(l9l, ifTTI . ifTSl . we 
do not consider a global adversary that is capable of con- 
trolling the whole network and observing all communication 
traffic. Such a global adversary seems unpractical in large- 
scaled P2P networks. Instead, we assume a partial adversary 
that controls a fraction / of all nodes in the network (/ is 
typically assumed to be up to 20%). Malicious nodes can 
behave in an arbitrarily malicious way, such as intercepting, 
modifying or dropping any messages going through them, or 
injecting fake messages to any other nodes . We also assume 
that malicious nodes can log any messages they have seen 
and access to a high-speed communication channel to share 
any information with very low transmission delay. 

Also similar to related works, we do not attempt to solve 
the problem of Sybil attack |T9l| in this work. Defending 
sybil attack is an interesting research area that has drawn 
a lot of attentions; a number of effective solutions have 
been proposed, such as Il20l - ll22l . All these solutions are 
applicable to Octopus as extensions to resist Sybil attacks. 

B. Design Goals 

The major goal for security is to avoid lookups being 
biased by malicious nodes. In other words, given a lookup 
target, it should not be possible to misdirect the lookup path 
or bias the final lookup result. 

Pfitzmann and Hansen defined several relevant anonymity 
properties for message-based communication, such as sender 
and receiver anonymity [|23l . We consider equivalent prop- 
erties in the context of DHT lookups. 

« Initiator anonymity: given a lookup target, it should not 
be possible to determine its initiator 



< Target anonymity: given a lookup initiator, it should not 
be possible to determine its target. 

< Query unlinkability: given several queries with known 
targets, it should not be possible to find out if they came 
from the same initiator 

III. Security Mechanisms of Octopus 
A. Problem Description 

In a DHT system, like Chord 111, each node is assigned a 
unique ID associated with its IP address, and owns the IDs 
from itself to its direct predecessor on the ring. Each node X 
maintains a list of 0(logA^) contact nodes {csWeA fingers), 
where N is the network size, and the i-th finger of X is 
the owner (or successor) of the ID idx + 2'^^ (the first 
finger, i.e., i = 1, is X's direct successor). Besides, each 
node maintains a list of successor nodes for stabilization. 
Some DHTs 1241 also utilize the list of successors during 
lookups to speed up the lookup process in the last few hops. 

We study the more general case where the successor lists 
are used in lookups. We refer to the combination of the 
fingertable and the successor list as the routing table. More 
specifically, we consider the following lookup procedure. 
Assuming a lookup initiator / wants to find the owner of 
value V, it first queries rii, the node that is closest to v in 
its routing table, and ni sends its routing table to /B Then 
out of ?7i's routing table, / finds the node n2 closest to 
V and asks n2 for its routing table. The process iteratively 
proceeds until reaching Uk, for which one of its successors 
is the owner of v (i.e., the lookup target). 

Some of the queried nodes could be malicious and attempt 
to launch the following attacks. 

1) Lookup Bias Attack: If the last queried node Uk is 
malicious, it can replace the honest nodes in its successor 
list with malicious nodes, so that one of its (malicious) 
"successors" will be concerned as the lookup target. 

2) Lookup Misdirection Attack: Instead of trying to bias 
the lookup result, malicious nodes could attempt to make / 
query more malicious nodes during the lookup, by providing 
manipulated fingertables (i.e., replacing honest fingers with 
malicious nodes). This attack is a big threat to anonymity 
since the adversary can learn more information about the 
lookup target from a larger number of queried malicious 
nodes lfT4l . 

3) Finger Pollution Attack: In a DHT system, each node 
periodically performs lookups to update its fingers. An attack 
related to this is that malicious nodes could attempt to 
pollute honest nodes' fingertables during the finger-update 
lookups, so that the polluted fingertables can contribute to 
the lookup bias and misdirection attacks. 

^In vanilla DHT lookups, / tells v to each queried node, which will 
return the finger closest to v. however, this reveals the lookup target to 
malicious intemiediate nodes. Hence, for anonymous lookups, / asks each 
intemiediate node for its full routing table, without revealing v. 



B. Security Mechanisms 

Many existing secure DHT designs ||6l, ifTSI . lfT6l employ 
redundant queries or lookups to tolerate misinformation 
provided by malicious nodes. However, the redundant trans- 
mission creates more opportunities for an adversary to gain 
information about the lookup initiator and/or target llTSI . 
Some schemes iflTl . ifTSl utilize quorum-based topologies 
and threshold cryptography to limit byzantine adversaries, 
but they also require the initiator to contact multiple nodes 
at each step of the lookup (for cryptographic operations), 
which accelerates information leak. Some other schemes, 
such as Myrmic IZSl . prevent routing table manipulation by 
introducing a central trusted authority to sign each node's 
routing table; however, this approach is impractical since 
for each node join or chum, the authority has to regenerate 
the signatures for all related nodes, rendering a performance 
bottleneck. 

To effectively limit active attacks while minimizing infor- 
mation leak, we propose a new defense strategy by letting 
each (honest) node secretly check the correctness of other 
nodes' routing tables. Such checks are preformed offline 
(i.e., independent of lookups), and thus do not reveal any 
information of lookups. To punish discovered malicious 
nodes, we use a certificate authority (CA) to issue certificates 
and revoke certificates from identified malicious nodes so 
that the malicious nodes can be gradually removed from the 
system. We note that the CA in our case is fundamentally 
different from that of Myrmic ll25l . The latter is required 
to be online all the time and needs to update signatures 
for multiple nodes for each node churn/join. Whereas, the 
certificates in our scheme are independent of nodes' routing 
states and thus do not need to be updated frequently. Our 
simulation results show that the workload of our CA is 
sufficiently low and can be handled by most Internet servers. 

There has been several fairly efficient and scalable revo- 
cation mechanisms in the literature, such as Merkle Hash 
Tree based certificate revocation |26], efficient distribution 
of revocation information over P2P networks ||27l . and 
scalable PKI based on P2P systems fSHl. Since certificate 
management in our scheme is essentially the same as these 
systems, we do not particularly study certificate revocation 
in this work. 

1) Secret Neighbor Surveillance: To limit the lookup bias 
attack, we propose secret neighbor surveillance, a mecha- 
nism that prevent malicious neighbors from manipulating 
their successor lists. 

In particular, we let each node maintain a predecessor 
list, in the same way as maintaining the successor list 
(i.e., periodically running Chord stabilization protocol anti- 
clockwise). The predecessor list is of the same size as the 
successor hst, and thus each node X should be contained in 
the successor list of any of its predecessors. In other words, 
if X is not contained in the successor list of its predecessor. 



it means this predecessor is trying to manipulating its 
successor list by replacing X with another node. Our goal 
is to let X detect this. 




Figure 1 : Secret neighbor surveillance. X is checking if itself is 
included in its predecessor Pa's successor list. If not, it means P3 
manipulates its successor list by replacing X with another node. 



In the lookup bias attack, a malicious node provides a 
manipulated successor list in response to lookup queries. 
Therefore, we let X anonymously sends a "lookup query" 
to one of its predecessors, say P3 (see Figure [U, and checks 
if itself is included in Pa's successor list. "Anonymously" 
is required since if the malicious node can distinguish a 
testing query from real lookup queries based on the querier' s 
identity, it can always provide the correct successor Ust 
for testing queries to avoid being detected. The anonymous 
transmission can be achieved by using the basic onion 
routing technique ||29l . i.e., X chooses two randomly peers 
as relays to forward its query to P3 while using onion 
encryption to ensure each hop on the forwarding path can 
only know its previous and next hops. The two relay nodes 
can be found by performing a /-hop random walk on the 
overlay network (where I = 0(logA^)). The details of the 
random walk are provided in Appendix |A] 

X performs the above checks from time to time (i.e., 
with time interval tc Gr (0, Tm] where T™ is the maximum 
checking interval) on randomly selected successors. A de- 
tected malicious node will be reported to the CA. To provide 
a non-repudiation proof on a manipulated successor list (i.e., 
verifiable to the CA), we let each node sign its routing table 
and attach a time stamp to it. 
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Figure 2: Successor list pollution. The malicious successor P2 
pollutes Pa's successor list in Pa's stabilization by providing a 
manipulated successor list. 



Another strategy to launch the lookup bias attack is to 
pollute honest nodes' successor lists during stabilization. For 
example, as shown in Figure |2] assuming P2 is malicious 
and P3 is honest, P2 can send P3 a manipulated successor 
list excluding X during Pa's stabilization, so that P3 will 
concern X as dead and remove X from its successor list; 
consequently, P3 will be mistakenly identified as a malicious 
successor by X and P2 will still be uncovered. To deal 
with this, we let each node sign its successor list used 
in stabilization; also, each node keeps a queue of latest 
received successor lists in stabilization as proof, to prove 
that its successor list is not intentionally manipulated. For 
example, P3 can provide its proof to the CA showing 
that its successor list is correctly computed according to 
the information provided by P2. If Pa's proof is verified 
(according to the stabilization algorithm) by the CA, then 
the suspicion on P3 is cleared and the CA will request P2 
for its proof, and check it against P3's proof. This process 
is repeated until finding a node that cannot provide valid 
proofs and this node is then judged as the malicious node. 

2) Secret Finger Surveillance: Likewise, we propose a 
secret finger surveillance mechanism for the lookup misdi- 
rection attack by limiting malicious nodes from manipulating 
their fingertables in the lookup. 
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Figure 3: Secret finger surveillance. X is checking if node Y has 
replaced its finger F with a malicious node F' . X first asks F' for 
its predecessor fist, and then anonymously sends a "lookup query" 
to a random node Pi in P''s predecessor fist. If any node in Pi's 
successor fist is closer to the ideal finger ID than F\ X detects Y 
manipulated its fingertable. 

In particular, we let each node keep a small number of 
received fingertables (e.g., from lookups, secret neighbor 
surveillance, or random walks). From time to time, node X 
chooses a random finger from one of the kept fingertables, 
say the i-th finger F' of node Y , and asks F' for its 
predecessor list (see Figure |3). Then, after waiting a short 
random period of time, X anonymously sends a "lookup 
query" to a random predecessor of F' (say P[), and checks 
if any node in Pj"s successor list is closer to the ideal finger 
ID than F' (i.e., F' is not the true finger F). 

The intuition behind this is that if Y replaces a honest 
finger F with a malicious node F' , at least one of P"s (true) 
predecessors should be closer to the ideal finger ID than F' . 



Hence, if P' provides X with its true predecessor list, the 
fingertable manipulation will be detected. Therefore, P' has 
to manipulate its predecessor list to ensure that all the "pre- 
decessors" are malicious, so that the selected predecessor P{ 
can collude with P' by providing a manipulated successor 
list that is consistent with the predecessor list provided by 
P'. On the other hand, however, P[ cannot freely manipulate 
its successor list, since P[ is under surveillance by its 
neighbors (i.e., secret neighbor surveillance). Therefore, if 
the adversary tries to manipulate a single finger (P —> F'), 
she has to sacrifice at least one malicious node, either P{ or 
P' and Y. 

3) Secure Finger Update: We can invoke the secret finger 
surveillance to limit the finger pollution attack: when X 
obtains the result (say P') of the finger-update lookup, it asks 
P' for its predecessor list, and chooses a random predecessor 
P{ of P' to perform the same checks as in the secret finger 
surveillance to verify P' is the true finger; X uses P' to 
update its fingertable only when P' passes these checks. 

C. Security Evaluation 

We use the following metrics to evaluate our security 
mechanisms. 

• fraction of remaining malicious nodes, 

• false positive rate, i.e., the chance that a honest node 
is judged as a malicious node, 

• false negative rate, i.e., the chance that a malicious 
node is not identified when being tested by a node, 

• fiilse alarm rate, i.e., the chance that there is no node 
identified in a report sent to the CA. 

These metrics represent different aspects of security prop- 
erties. Reduction of malicious nodes shows effectiveness, 
false positive/negative rates represent accuracy, and false 
alarm rate indicates efficiency. 

1) Experiment Setup: We developed an event-based sim- 
ulator in C++ with about 3.0 KLOC. We consider a WAN 
setting, where latencies between each pair of peers are 
estimated using the King dataset |30l. We model node 
churn/join as an exponential distribution process f{x) = 
^g-(i/A)2; ^jj.jj mean life time A minutes. We generate 
random network topologies of size A^ ~ 1000 with 20% 
malicious nodes. Each node maintains 12 fingers and 6 
successors/predecessors (on the order of 0(logAf)). With 
similar configurations as related works |IT], ||9], we let each 
node run successor and predecessor stabilization protocols 
every 2 seconds, and performs finger update every 30 sec- 
onds. To discover malicious nodes, each peer performs secu- 
rity checks of secret neighbor surveillance and secret finger 
surveillance every 60 second^. To ensure high identification 
accuracy, each node keeps 6 latest received successor lists as 

'This is based on the frequency of stabilization and finger update as well 
as the node chum rate, and we found that doing security checks every 60s 
is sufficient to rapidly discover malicious nodes. 
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Figure 4: Simulation results of the security mechanisms 



proofs. Besides, we let each node perform one lookup every 
minute (we choose 1 min only because we want to test a 
large number of lookups within a relatively short simulation 
time). 

2) Experimental Results: We see from Figure |4a] that the 
secret neighbor surveillance mechanism can rapidly identify 
malicious nodes that try to bias lookups. After a short time 
(20 mins), almost all malicious nodes are discovered. In 
comparison, the speed of discovering malicious nodes by 
the secret finger surveillance mechanism is relatively slower 
(as shown in Figure l4bT i. but still it can identify over 80% 
malicious nodes within 30 mins. The speed of identifying 
malicious nodes by the secure finger update mechanism is 
faster than that of the secure finger surveillance (as shown in 
Figure He}, because the former is performed more frequently 
(at each finger update) and some malicious fingers contained 
in the successor list can also be detected by the secret 
neighbor surveillance. 

The accuracy of our attacker discovery mechanisms is 
shown in Table |T] The false positive rate is for all the three 
mechanisms even when the churn rate is very high (e.g., 
the mean life time for each node is 10 mins). This ensures 
that honest nodes will not be judged as malicious nodes 
by mistake. In addition, the secret neighbor surveillance 
has very low false negative rate (less than 0.6%), which 
impUes that any malicious nodes that try to bias lookups 
can be caught with high probability. We also see that 
the false negative rates for the secret finger surveillance 
and secure finger update mechanisms are relatively higher 
This is because a malicious finger can pass the security 
checks if the randomly selected predecessor happens to be 
a colluding node and provides a successor list consistent 
with the malicious finger However, over time, a malicious 
node can be identified with very high probability as shown 
in Figure |4] 

We also compare our scheme with a state-of-the-art secure 
DHT scheme Halo ||T6l in terms of the number of biased 
lookups over time. We calculate the ratio of biased lookups 
of Halo according to their analysis results lfT6l §4. 1 using the 
parameter / = 7 as they suggested. We can see from Figure |5] 
that after a short period of time, there are no more biased 



lookups in Octopus, while the number of biased lookups 
of Halo keeps increasing in linear with the total number 
of lookups. This demonstrates that our security mechanisms 
can fundamentally thwart active attacks. 
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Figure 5: Security comparison 

Finally, We evaluate the workload of the CA in terms of 
the number of messages (including reports, proofs, and etc) 
processed over time. We can see from Figure |6] that even 
during the peak time (the first 10 min), the CA only needs 
to process about 2 messages per second on average, which 
can be handled by most Internet servers. 
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Figure 6: The CA's workload 



D. Other Attacks, Countermeasures, and Discussion 

We also consider other potential attacks, such as the selec- 
tive denial-of-service attack and the relay exhaustion attack; 
we discuss these attacks and propose countermeasures in 
Appendix. 



Table I: False positive/negative/alarm rates of the security mechanisms. A is mean life time of each node (in minute). Attack rate is 100%. 
In fingertable manipulation/pollution attacks, checked malicious predecessors provide manipulated successor lists with 50% chance. 
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IV. Anonymity mechanisms of Octopus 

A. Problem Description 

In vanilla DHT systems, since the lookup initiator / 
queries intermediate nodes directly, the queried nodes can 
easily infer the /'s identity. A natural idea to let / hide its 
identity by sending queries through an anonymous path, as 
in attacker identification mechanisms. An illustration of this 
is shown in Figure |7] 
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However, we note that a single anonymous path is in- 
sufficient to achieve high levels of anonymity. We use the 
example in Figure [T] to show this. Assume queried nodes 
E2 and E4 are malicious, and the first relay A is also 
malicious. With a single anonymous path, the adversary can 
learn that E2 and E4 belong to the same lookup since they 
are contacted by the same exit node B. Wang et al. lfT4l 
have shown that based on the positions of a few queried 
malicious nodes in the lookup, the adversary can narrow the 
range of the lookup target into a small set of nodes (called 
range estimation attack). Suppose there are c concurrent 
lookups each having an estimation range of d nodes; then, 
the adversary can know that / is doing an lookup and its 
target is one of the c • d nodes. 

B. Anonymity Mechanisms 

To address the limitation of a single anonymous path, we 
propose to split lookup queries over multiple anonymous 
paths, as shown in Figure [8] 

Using separate anonymous paths for different queries 
effectively disassociates the adversary's observations. The 
adversary only sees disjoint events from different concurrent 
lookups, but is unable to group queries belonging to the same 
lookup; in this case, it is much harder to apply the range 




Figure 8: The structure of multiple anonymous paths in Octopus. 
A, B, Ci and Di are relays. 



estimation attack, thus substantially limiting the information 
leak. 

Moreover, to further blur the adversary's observations and 
make the range estimation attack even harder, we propose to 
add dummy queries in the lookup. Then, even though in rare 
cases the adversary can link two queried nodes in the same 
lookup (e.g., when all the relays used for the two queries are 
malicious), the adversary is unable to tell whether they are 
dummy queries or true queries, and the result of the range 
estimation attack would be incorrect with a dummy query. 

We note that using multiple anonymous paths is important 
to ensure effectiveness of dummy queries, because in the 
single-anonymous-path scenario, observed queries are link- 
able due to the common exit relay and hence dummy queries 
can be distinguished based on the positions of observed 
queries. In comparison, with multiple anonymous paths, 
identifying dummy queries is much harder 

C. Anonymity Evaluation 

We analyze the best strategies for an adversary to infer the 
lookup target T and the initiator / based its observations, 
and calculate the target anonymity H{T) and the initiator 
anonymity H{I). We use entropy to quantify H{T) and 
H{I). We let O denote the set of possible observations of 
the adversary (including null observation). To measure the 
system as a whole, we have: 

H{T) = Y,P{o)-H{T\o),H{I) = Y,P{o)-H{I\o) (1) 



oGO 



oeo 



where P{o) is the probability of observation o occurring. 

To calculate the maximum information leak, we make the 
following assumptions in the anonymity calculation. First, 
we assume the network is static, since network dynamics 
can obscure the adversary's observations and make it more 
difficult to extract information about the initiator/target. 



Second, we only consider passive attacks, as active attackers 
can be quickly identified by our security mechanisms and 
consequently the adversary will lose observers to carry out 
passive attacks. 

1) The Adversary's Observations: An observation o con- 
sists of a (large) number of observed events, which are mes- 
sage transmissions seen by malicious nodes. Each observed 
event can provide information, such as sender/receiver IDs, 
message content, and transmission time. The adversary can 
log all the observed events, and try to derive useful infor- 
mation from any combinations of them. 

Observations of queries. There are two cases where a 
query is observed (i.e., the queried node is identified): 1) 
the queried node itself is malicious, or 2) the exit relay is 
malicious. The adversary can link an observed query back- 
wards to / in two ways. One is through direct connection of 
compromised relays on the anonymous path. For example, if 
the relays A and Ci and the queried node Ei are malicious 
(refer to Figure |8}, Ei is observed and can be linked to / 
using Ci and A as bridges. The other is through linkability 
of relays to / in the random walk. For instance, if Di is 
compromised and linkable to / in the random walk, then Ei 
is observed and linkable to / using Di as a shortcut. It is 
possible to use both approaches at the same time. 

Furthermore, considering all queries in the same lookup 
together can help the adversary link more queries to /. Since 
all {Ci,Di) in the same lookup are connected to the same 
relay B, if there exists one query linkable to both / and B, 
then any other queries that are linkable to B can also be 
linked to /. In the rest paper, we use "a linkable query" to 
specifically refer to a query that is observed and linkable to 
/. 

Observations of the initiator. The adversary's goal is 
to link / and T; knowing only one of them is useless to 
the adversary. Therefore, the pre-condition of compromising 
the target anonymity is to observe /. Since / is directly 
connected to A, the adversary can observe / as long as A is 
malicious. In addition, / is also observable in random walks; 
hence, another case for / being observed is that there exists 
at least one malicious relay linkable to / in a random walk. 

Observation of the target. For similar reasons, in order 
to compromise the initiator anonymity, the adversary has 
to know T. We note that T is not necessarily contacted 
during the lookup, since the aim of the lookup is to find 
the IP address of T, which can be learnt from query replies 
of intermediate nodes (after the lookup is done, T might 
be contacted due to some application needs, but we do not 
consider this as part of the lookup). Yet we assume that each 
node can tell whether itself is a target node based on its role 
in the application. For example, in DHT-based anonymous 
communication, a node can learn that itself is a lookup target 
if it is selected as a relay of an anonymous circuit. Therefore, 
we concern T as observed if T itself is a malicious node. 



2) Initiator Anonymity: To calculate the initiator 
anonymity, we divide the observations of the adversary into 
two categories. 

« o„: the observation occurring when T is not observed 
• Oo'- the set of observations occurring when T is ob- 
served 
According to [T] H{I) is calculated as: 

H{I)=P{0n)-H{I\0„)+ ^ P{Oo)-H{I\Oo) (2) 

OaeOa 
When T is not observed, the entropy of / is maximized. 
Presuming that the adversary can exclude malicious nodes 
from the anonymity set of /, we have: 

H{I\o„) = log2 ((1 -f)-N) (3) 

Let TZlp denote the set of non-dummy queries linkable to 
/ in the lookup whose target is T. Then, based on whether 
TZip is an empty set, we calculate H{I\oo) as follows: 

H{I\oo) = P{n'T^%)-H\I\oo) 

+ {l-P{n'T = %))-H"{I\oo) (4) 

When there is no linkable non-dummy query, / is un- 
linkable with T. However, since some of the initiators of 
concurrent lookups can be observed by the adversary, we 
have: 

H'{I\oo) = P{I_obsv) ■log2{ifobsv_hon_init) 

+ (1 - PiI_obsv)) ■ log2((l -f)-N) (5) 

Let \E'' denote the set of concurrent lookups that have at 
least one linkable query, and ipx denote the lookup with 
target T. When 7?.^ 7^ 0, V't S ^'. Each lookup in vl'' is 
possible to be ipT- Therefore, we have: 

H"{I\Oo) = -Y, Pii^^ MOo) ■ l0g2 P{i' = i'T\Oo) 

(6) 
Because the density of queries close to the target is 
higher than other regions on the ring, for ipx it is highly 
likely that the last queried node in TZip is located very 
close to T. Therefore, the adversary can assign probability 
to each candidate initiator based on the minimum distance 
(i.e. number of hops) between its queried nodes and T. In 
particular, let QL denote the set of linkable queries in ip, 
t/j G ^', and let ^(x) denote the probability that for ipT 
the minimum distance from linkable queried nodes to T is 
X. £,{x) can be obtained via pre-simulations of the lookup. 
Then, we can calculate P{ip = iPt\oo) as follows: 



P{i/j = -iprloo) 



^(min^ggi dist{E,T)) 
E^'g*' '?(min£;,ggi ^ dist{E',T)) 



(7) 



3) Target Anonymity: We categorize the adversary's ob- 
servations into three classes: 

• On'- the observation occurring when / is not observed 

• Oi : the set of observations occurring when there is at 
least one linkable query in the lookup 

• Od'- the set of observations occurring when there is no 
linkable query in the lookup 

According to Equation ^, we have: 



HiT) 



P{on)-H{T\on)+ V P{oi)-H{T\oi) 



oieo, 



Od^Od 



P{od) ■ H{T\od) 



(8) 



Since the adversary has to know / at the first place, the 
entropy of T is maximum when / is not observed, i.e. 
H{T\on)^\og2N. 

Calculation of H{T\oi). When there exist queries that 
are linkable to /, the adversary can adopt range estimation 
attack to narrow the range of T. The output of this attack is 
a lower bound and an upper bound of T's location on the 
ring. 

We temporarily assume that all queries observed by the 
adversary are non-dummy queries (how to deal with dummy 
queries is shown later). Suppose there are two or more 
linkable queries in the lookup. Let Ei and Ej denote the 
first and the last linkable queried nodes, respectively. Since 
nodes succeeding T will not be queried in the lookup, Ej 
can be used as a lower bound of T. An upper bound of T 
can be obtained based on the fact that the lookup always 
greedily queries the finger that is precedingly closest to the 
target. In particular, the adversary first decides the queried 
nodes between Ei and Ej by (locally) simulating the lookup 
from Ei to Ej. The initial upper bound is set as Ei\ then 
for each pair of consecutive queries (Ek^ E^+i) between Ei 
and Ej, i < k < j — 1, the adversary finds out the index of 
Ek+i in Ek's finger table (say p) and uses the (p + l)-th 
finger of Ek to update the upper bound. 

Since the density of queries close to T on the ring is 
higher than other regions, nodes located closer to T in the 
estimation range are more likely to be T. We let jiijz) 
denote the probability that the i-th node (clockwise) in an 
estimation range of size z is the target, 1 < i < z. The 
probability distribution of 7(1,2) can be obtained by pre- 
simulation of the lookup. Note that the range estimation 
attack is inapplicable when the lookup has only one linkable 
query (say Ei), but the adversary can use the successor of 
Ei as the lower bound of T and the predecessor of Ei as 
the upper bound of T, and assign higher probabilities to the 
nodes closer to the lower bound in the estimation range. 

Now we discuss how to deal with dummy queries. Let Q^ 
denote the set of linkable queries in the lookup performed 
by /, and TZ\ denote the set of linkable non-dummy queries, 
TZ'-j C Q^. Based on whether TZ'-j is an empty set, H(T\oi) 



can be calculated as: 

H{T\oi) = 



p{n\ = 0) • H„ 
(1 - p(7^', = 0; 



H'{T\oi) (9) 



Hm denotes the entropy when all linkable queries are 
dummies. In this case, the linkable queries cannot provide 
any information about T. However, the adversary can ob- 
serve all (concurrent) malicious target nodes, and T has 
chance / to be one of them. Therefore, we have: 



i?m = (l-/)-l0g2((l-/).7V) 

+ / • \og2{4f^mal_targets) 



(10) 



When TZ\ is non-empty, a range estimation attack based 
on TZ\ can produce a minimum range of T. Whereas, an 
estimation range calculated using any dummy query will 
be incorrect. Due to use of anonymous paths, an individual 
dummy query is indistinguishable from any non-dummy 
query. Nevertheless, the adversary can base on timing and 
location relationships between queries to filter out some 
subsets of Q\ that contain dummy queries. In particular, 
any subset of queries that violates the following rules must 
contain at least one dummy query: 

• if Ei is queried before Ej, then Ei must precede Ej 

• if Ei and Ej are the first and last queried nodes in the 
subset, then any other query must be on the path of the 
virtual lookup from Ei to Ej 

Note that the above approach cannot remove all subsets 
that contain dummy queries. Let Sj denote all subsets of Q\ 
that pass the above filtering test, Sj C 2'^' . Since all queries 
in Tl\ are non-dummy, Tl\ will pass the filtering test, i.e., 
Tl\ E Sj. From the adversary's prospective, each element 
of <S/ is possible to be TZ\. The best strategy for her is to 
assign different probability to each element in Si according 
to the pre-calculated probability distribution of 7^j. We use 
two variables to characterize 7^^: the number of queries in 
TZ^j, and the largest hop in the virtual lookup from the first 
query in 7?,j to the last queryO 

Let X denote an arbitrary node that is contained in any 
estimation range. Then: 

H'{T\oi) = -Y,PiX^T\oi)- log2 P{X = T\oi) (11) 

X 

Let G{s) denote the estimation range computed based on 
s, s £ Sj. Let loc{G{s),X) denote the location of X in 
the estimation range G{s), and | • | denote the number of 
elements of a set. Then, we have: 

PiX = T\oi) = V P(s = TZ\\o,)-j{loc{G{s),X), \G{s)\) 



seSi 



(12) 



''The largest hop means the largest ID difference between two consec- 
utively queried nodes. The largest hop also implies the number of hops 
in the lookup. These two characteristics are a close approximation of the 
adversary's observation on 7?.j. 



Let V{s) denote the largest hop in the virtual lookup 
based on s, and x(a;, y) denote the probability that a set 
of X queries with the largest hop in the virtual lookup being 
y is 72,j. Then, we have: 



p{s^n\M 



xi\s\,Vis)) 



Es'^SrXiWlVis')) 



(13) 



x(x,y) is obtained by pre-simulations of the lookup. 

Calculation of H{T\od)- There are three possible cases 
when there is no linkable query: 

• casei'. there is no query observed by the adversary 

• case2'- there is at least one (observed) query that is 
Unkable to B 

• case^: there is no query linkable to B but at least one 
query is observed by the adversary 

Let Hi, H2, and H3 denote the entropy of T in the three 
cases, respectively. Then, we have: 

H{T\od) = P{casei) ■ Hi + P{case2) ■ H2 

+ P{case3)-H3 (14) 

In the first case, since no information is learnt from 
queries. Hi is calculated as Equation JTOl i. 

For the second case, although / is disassociated with any 
observed queries, the adversary can group queries belonging 
to the same lookup based on whether they are linkable to a 
common relay B. Furthermore, she can calculate estimation 
ranges for each concurrent lookup that contains queries 
linkable to B, and consider all estimation ranges as possible 
candidates for the true estimation range of T. 

In particular, we let TZf denote the set of non-dummy 
queries linkable to B in the lookup performed by /, and H2 
can be calculated as follows. 

H2 = P{nf = 0) • F,„ + (1 - Pinf = 0)) • H!, (15) 

where i?2 denotes the entropy when there is at least one non- 
dummy query linkable to B. If T is malicious, the adversary 
can reduce the candidates of T down to the set of observed 
malicious targets; otherwise, she needs to rely on the queries 
linkable to B to infer T. Therefore, we have: 



^^ = 



/ • \og2{iJ=mal_targets) 
{l-J)-Y^P{X = T\od) 



log2 P{X 



(16) 

T\od) 



Let $^ denote the set of concurrent lookups that have at 
least one query linkable to B, and ijji denote the lookup 
performed by /. Let TZR denote the set of non-dummy 
queries linkable to _B in ?/;,?/; G ^^, and let S^ denote all 
subsets of queries (linked to B) in ijj that pass the filtering 
test. Then, we have: 



P{X = T\od)= Y. P{i^ = i^i\od) 



(17) 



i/^e*^ 



^ P(s = 7^^|od) .7(^0(5), X),|G(s)| 



s&S^ 



Since / is unlinkable to any observed queries, each 
concurrent lookup is equally likely to be ifjj. We have: 

P{^ = i^i\od) = -^ W^evi/s (18) 

P{s — Ti}Aod) is calculated the same as Equation ( fT3] ). 

In the last case, since all observed queries are disasso- 
ciated with each other, the range estimation attack cannot 
be applied. Let THj denote the set of observed non-dummy 
queries in the lookup performed by /. Similar to the above 
cases, we have: 



H^ = 



P(7^? = 0) ■ F,„ + (1 - P(7^? = 0)) • H'^ (19) 
f ■ log2{i^rnal_targets) (20) 



Let Ej denote the query closest to T in 7?,^. Then, the 
adversary can use Ej's successor as the lower bound of the 
estimation range of T and Ej's predecessor as the upper 
bound, and assign probabilities to the nodes in the range 
according to a pre-calculated probability distribution of T. 
Let Q° denote the set of observed queries of all concurrent 
lookups, and G{E) denote the estimation range based on E. 
Then, we have: 

P{X^T\od) = ^ P(P = £;7|od) (21) 

EeQ" 
-l{loc{G{E),X),N-l) 

Based on the observations, the adversary is unable to tell 
which query is more likely to be Ej. Therefore, we have: 

1 



P{E = Ei\od) = 



IQ1' 



VPe Q° 



(22) 



D. Results and Comparisons 

We developed a simulator for anonymity measurements 
in C++ with about 1.3 KLOC. The results are shown in 
Figure [TO] With network size N — 100 000, concurrent 
lookup rate a = 1%, f = 20% malicious nodes, and 6 
dummies. Octopus only leaks 0.57 bits of information about 
the initiator and 0.82 bits of information about the target. We 
compare Octopus with the base-line scheme Chord yj and 
the state-of-the-art anonymous DHT lookups NISAN Q and 
Torsk ISl (we do not explicitly compare Octopus with secure 
DHTs such as Halo (T6l . since they leak more information 
than Chord.). We can see that in the same setting, NISAN 
and Torsk leak about 3.3 bits of information about the 
initiator, which is about 6 times more than Octopus. As for 
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Figure 9: Anonymity evaluation of Octopus, a is the concurrent lookup rate. 
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Figure 10: Anonymity comparison, a = 1%. 



Table II: Performance comparison, r is the time interval between 
two consecutive lookups. 



Schemes 


Lookup Latency (sec) 


Bandwidth Consumption (kbps) | 


Mean 


Median 


T = 6min 


T = lOmin 


Octopus 


2.15 


1.61 


5.91 


4.30 


Chord UJ 


1.35 


0.35 


0.29 


0.28 


Halo ri6l 


6.89 


1.79 


0.71 


0.37 



the target anonymity, the information leak for NISAN and 
Torsk is 1 1.3 bits and 3.4 bits, which is 13 times and 4 times 
more than that of Octopus, respectively. 

V. Performance Evaluation 
A. Lookup Latency 

Lookup latency is one of the most important performance 
factors for DHT systems. We measure the lookup latency of 
Octopus using PlanetLab with 207 randomly selected nodes. 
We use boost C++ librarjO (mainly UDP asynchronous 
read/write of Boost. Asio) to build the communication sub- 
strate. We let each node perform 2000 lookups indepen- 
dently using randomly picked lookup keys. For each lookup, 
we record the latency from the time of sending out the first 
query till the time of receiving the lookup result. 

For comparison, we use the same methodology to im- 
plement Chord III and Halo |fT6l , and measure their lookup 
latencies in the same network environment. We choose Halo 

'www.boost.org 




Figure 1 1 : Comparison of lookup latency on Planetlab. 



because it is one of the state-of-the-art secure DHT lookup 
schemes and it is also based on Chord overlay. For Halo, we 
use degree-2 recursion with redundant parameter 8 x 4, as 
suggested in their paper llT6l to provide fairly strong security 
guarantee. The experimental results are presented in Figure 
[TTIand Table HIl We can see that while the lookup latency of 
Octopus is relatively longer than that of Chord due to more 
transmission for security and anonymity needs, it is smaller 
than that of Halo, which only provides security guarantees. 
The outperformance is because Octopus does not rely on 
redundant lookups, while in Halo a lookup is not completed 
until all redundant lookups' results are returned. 

B. Bandwidth Overhead 

We also compare Octopus with Chord and Halo in terms 
of bandwidth cost. We adopt the same configuration as 
described in Section IIII-Cll for each of the DHT lookups. 



and consider an overlay network with 1 000 000 nodeso- 
We can see from Table |II] that Octopus does incur higher 
communication overhead than Chord and Halo, in order to 
achieve high levels of anonymity; however, the bandwidth 
cost of Octopus is still reasonable (only a few kbps), which is 
affordable even for low-end clients with limited bandwidth. 

VI. Related Work 
A. Secure DHT Lookups 

A major school of proposals to securing DHT lookups 
uses redundancy. Castro et al. ffl?! proposed a robust 
DHT system that relies on redundant lookups. Each key 
is replicated among several replica nodes (typically the 
neighbors of the key owner). Instead of doing a single 
lookup, the initiator performs multiple redundant lookups 
towards all the repUcas. The lookup result would be correct 
as long as one of the redundant lookups is not biased. The 
limitation of this approach is that the redundant lookups 
tend to converge to a small number of nodes close to the 
target, and one malicious node in this set could infect many 
redundant lookups. Much subsequent work (such as ||6], 
lfT6l . |^2|) focuses on disentangling the redundant lookup 
paths to provide better security. Cyclone ll32l partitions 
nodes into r Chord sub-rings based on similarity of node 
IDs, and has r redundant lookups routed through the r sub- 
rings independently. Salsa ||6| uses a new virtual-tree-based 
DHT structure, in which any two nodes share few global 
contacts so that redundant messages can proceed along 
different paths. Halo |[T6l does not change the underlying 
DHT structures, but uses the original Chord overlay and 
performs redundant searches towards knuckles - nodes that 
have fingers pointing to the target. 

While effective in ensuring security, these redundant- 
lookup-based approaches are incapable of preserving 
anonymity, since redundant transmission creates opportuni- 
ties for an adversary to gain information about the lookup 
initiator and/or target ifTSl . Shadow Walker IjOj embeds re- 
dundancy into the DHT itself and uses shadows (nodes in 
redundant topologies) to verify each step of a lookup. Un- 
fortunately, Schuchard et al. ll33l found that ShadowWaUcer 
is vulnerable to eclipse attack, where the entire set of 
shadows of a certain node are compromised, leading to 
other nodes' routing states being infected. They also showed 
that increasing the dimension of redundant topologies can 
mitigate the eclipse attack, but the resultant performance cost 
is prohibitively high. 

Another major school of research on secure DHT lookups 
leverages cryptographic techniques. Mymric ll25l uses an 

''We use the following parameters to estimate the bandwidth overhead. 
Each routing state item (such as fingers or successors) is 10 bytes. We use 
ECDSA signature (40 bytes) for authentication with a 4-byte timestamp, 
and AES-128 for onion encryption. Each certificate is 50 bytes, including 
the node's IP address (6 bytes), the node's public key (20 bytes), expire 
time (4 bytes), and the CA's signature (20 bytes). 



online certificate authority to sign each node's routing state. 
The major limitation of Myrmic is that for each node 
join/churn, the central authority has to update the certifi- 
cates for all related nodes. Young et al. ifTTI proposed two 
schemes RCP-I and RCP-II that use threshold signature and 
distributed key generation to avoid the reliance of a central 
authority. In their schemes, the verification information on 
each message is collaboratively generated by a threshold 
number of nodes, rather by a central authority. 

All these secure DHT lookup schemes are not designed 
to preserve anonymity. Lookup keys are revealed during 
queries, and identities of lookup initiators are easily exposed 
due to directly contacting intermediate nodes. 

B. Secure and Anonymous DHT Lookups 

NISAN Q is among the first to try to provide both 
security and anonymity guarantees in DHT systems. For 
security purpose, each queried node is required to provide 
its entire fingertable, so that the lookup initiator can apply 
bound checking on it to limit manipulation of fingerta- 
bles. NISAN also uses redundancy to enhance security. 
The authors proposed a greed-search mechanism to query 
multiple nodes at each step and combine the query results 
to tolerate misinformation. On the other hand, acquiring 
the entire fingertable also helps protect the anonymity of 
lookup targets, since the lookup keys are not revealed to 
intermediate nodes. Nevertheless, NISAN can only provide 
very limited anonymity protection. Wang et al. lfT4l showed 
that a passive adversary is able to narrow the range of a 
lookup target down to a small number of nodes, by analyzing 
the locations of observed queries (called range estimation 
attack). 

Torsk H] is a DHT-based anonymous communication sys- 
tem. A key component of Torsk is a proxy -based anonymous 
DHT lookup. The idea is that a lookup initiator performs 
a random walk on the overlay to find a random node 
(called buddy), and requests the buddy to perform the lookup 
on its behalf. Because Torsk uses Myrmic ifZSl to secure 
lookups, it has the same limitation as Myrmic - requiring 
an online central authority to sign each node's routing state. 
In addition, as we analyzed in Section HV-AI a single proxy 
structure is insufficient to provide high levels of anonymity: 
the information learnt by the range estimation attack can be 
used to launch relay exhaustion attack 0141 . 

Recently, Backes et al. ifTsll proposed to leverage oblivious 
transfer to add query privacy to RCP-I and RCP-II IITtI . 
However, for similar reasons as NISAN, this scheme is 
vulnerable to the range estimation attack, since the initiator 
needs to contact multiple intermediate nodes at each step of 
the lookup. 

Freenet ll34l is a deployed P2P system, which allows 
people to upload sensitive files to the overlay and employs 
data duplication strategies to make them hard to block. 
Freenet aims to preserve the pubUshers' privacy, but does 



not provide anonymity in lookups. Vassennan et al. ||35l 
create a membership concealing overlay network (MCON) 
for unobservable communication. They aim to make it 
difficult for either an insider or outsider adversary to learn 
the set of participating members. This is similar to previous 
darknet designs |36|. However, MCON and darknets are not 
designed to provide anonymity. 

VII. Conclusion 

In this paper, we presented Octopus, a new DHT lookup 
that provides strong guarantees for both anonymity and 
security. Octopus ensures security and anonymity via three 
fundamental techniques. First, Octopus constructs an anony- 
mous path to send lookup query messages while hiding 
the initiator Second, it splits the individual queries used 
in a lookup over multiple paths, and introduces dummy 
queries, to make it difficult for an adversary to learn the 
lookup target. Third, it uses secret security checks to identify 
and remove malicious nodes. We developed an event-based 
simulator, and showed that malicious nodes can be quickly 
identified with high accuracy. In addition, via probabilistic 
modeling and simulation, we showed that Octopus can 
achieve near-optimal anonymity for both the lookup initiator 
and target. We also evaluated the efficiency of Octopus on 
Planetlab, and showed that Octopus has reasonable lookup 
latency and communication overhead. 
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Appendix 

A. Random Walk for Relay Selection 

As shown in Figure [12] the random walk originates from 
the initiator / and is composed of two phases, with I nodes 
visited in each phase (Z = log(A^)). The motivation of 
dividing the random walk into two phases is to mitigate the 
timing analysis attack, wherein malicious nodes on the same 
anonymous path can be associated by analyzing timings of 
packets in the traffic going through them. In the first phase, / 
picks a random finger Ui out of its fingertable, and requests 
Ui for its fingertable, from which the second hop U2 is 
selected. Then / sends an onion-encrypted query to U2, 
using Ui as the forwarding node, and selects the third hop U3 
at random from the fingertable returned by U2. This process 
is recursively repeated for / hops. To provide integrity check 
and source authentication, each replied fingertable is signed 
by its owner with the owner's certificate attached. 
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Figure 12: Two-phase random walk for selecting relays. 

The second phase of the random walk is conducted by Ui, 
the last node visited in the first phase. In particular, / sends 
Ui a random seed through the anonymous path established 
in the first phase, and the seed will guide Ui how to pick 
nodes "randomly". For example, we can let Ui apply hash 
function to the seed for i times and map the hash value to 
[1, m] {in is the size of fingertables) and use the result as an 
index to select the i-th hop. The second phase is performed 
in the same way as the first phase, and the last two hops 
{U21-1 and U21) are chosen as a pair of relays to be used 
in lookups or attacker identifications. To prevent malicious 
Ui from biasing the random walk, Ui is required to keep all 
received fingertables, signatures and certificates, and send 
them back to / through the anonymous path of the first phase 
at the end of the random walk. Such information allows / to 
verify whether Ui has honestly performed the random walk. 
If the verification is invalid or / does not receive the results 



by a pre-set deadline, / chooses another node from Ui-i's 
fingertable to restart the second phase of the random walk. 

B. Other Attacks, Countermeasures, and Discussion 

1) Selective Denial-of-Service Attack: A threat to anony- 
mous communication systems (like Tor BTI '). the selective 
Denial-of-Service (DoS) attack ll42l . can increase the chance 
of compromising anonymous circuits by selectively dropping 
packets to tear down the circuits that are infeasible to 
compromise. The selective DoS attack is also applicable 
to Octopus. For example, to create more opportunities of 
observing lookup initiators, malicious relays can selectively 
drop lookup queries or replies, when the relay directly 
connected to the initiator is not malicious. Nevertheless, 
under our framework of attacker identification. Octopus can 
effectively constrain the selective DoS attack by identifying 
malicious droppers. 

We leverage the reputation-based reliability enhancement 
strategy for mix networks B3l to identify malicious dropper 
nodes. The idea is as follows. Each message is assigned a 
deadline by which it must be sent to the next hop (in either 
direction along the anonymous path). A relay X first tries 
to send the message to the next hop Y directly. If Y is alive 
and honest, it will send a signed receipt back to X. If X has 
not received a receipt from y by a specified period before 
the deadline, it will request a pre-defined set of witnesses 
(e.g., its successors and predecessors) to independently try to 
send the message to Y and obtain a receipt. If a witness gets 
a valid receipt, it will forward it to X\ otherwise, it sends 
X a signed statement to the delivery failure. Before sending 
any query, the initiator first checks if the first pair of relay s 
A, B are alive (S is checked by A). During the lookup, 
if the initiator does not receive the i-th query reply by the 
pre-set deadline, it queries the successors and predecessors 
of the relays Ci, Di (through the partial anonymous path A 
and E) about their aliveness, which can be inferred based 
on their recent stabilization activities. If both of them are 
alive, the initiator reports the failure to the CA with the 
identities of all the relays. Then the CA will request the 
relays to provide either receipts or statements, and based on 
the provided information, the CA will be able to identify the 
malicious dropper node. 

The (selective) DoS attacks are also possible in random 
walks. For example, a malicious hop of the random walk 
could simply drop packets to prevent the random walk from 
being completed, or a malicious Ui could deny returning the 
random walk result to the initiator if the result only contains 
honest nodes. We can use the same strategy as above to 
identify such malicious nodes. 

We use the event-based simulator (described in Sec- 
tion IIII-Cll ) to evaluate the defense mechanism for the 
selective DoS attack. The simulation results are shown in 
Figure [13] We can see that the malicious dropper nodes can 
be rapidly discovered by our identification mechanism. 
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Figure 13: Selective DoS attack 



2) Relay Exhaustion Attack: The relay exhaustion at- 
tack lfT4l is a selective-DoS-flavored attack used to compro- 
mise DHT-based anonymity systems that are lack of target 
anonymity protection. In such an attack, an adversary can 
utilize the information leak about the lookup target to predict 
the next relay of the circuit and launch flooding-based attack 
to prevent the circuit from being extended to the next relay. 
While Octopus also uses relays in the lookup, it is resistant 
to the relay exhaustion attack, since little information about 
the target is leaked in Octopus (as shown in Section [IV-Ct . 

3) End-to-End Timing Analysis Attack: The end-to-end 
timing analysis attack is an attack that associates two ma- 
licious relays (e.g., A and Di in Figure |8]i on the same 
anonymous path, by analyzing timings of packets in the 
traffic going through them. Since in Octopus there is only 
one message transmitted through each anonymous path in 
either forward or backward direction, the timing analysis 
attacks that require a large number of observed packets (such 
as packet counting ifJTl or packet timing correlation ll38l . 
1391 ) are inapplicable to Octopus. 

For Octopus, the only strategy to associate A and Di is us- 
ing the similarity of upstream and downstream latencies: in 
a noise-free network environment, the transmission latency 
from A to Di should be the same as that from Di to A. How- 
ever, this similarity is determined by communication latency 
jitters and can be easily destroyed by adding a short random 
delay at the middle relay B. We simulate this attack using 
King dataset fJOl to find a pair of A and Di with the smallest 
difference between the upstream and downstream latencies. 
A random delay is added at B from the range [0, Tm\, where 
Tm is the maximum delay. We choose a typical network 
setting as used in related works Q, ||9], lfT3l . lfT4l : there 
are A^ = 1 000 000 nodes in the network, 20% of them are 
malicious nodes, and concurrent lookup rate a is between 
0.5% - 5%; the jitter window for a pair of communicating 
peers is set as 10 ms or 10% of the averaged transmission 
latency whichever is smaller (according to ll40) ). Table [III] 
presents the simulation results of this attack. When the 
maximum delay Tm is 100 ms and a = 5%, the error rate is 
as high as 99.91%; in this case the information leak is only 
(1 - 99.91%) • \og2{N ■ 0.8 + TV • a • 0.2) = 0.018 bit. This 
means the adversary can hardly learn extra information by 
launching the timing analysis attack. 



Table III; Error rate of end-to-end timing analysis attack, a 
is concurrent lookup rate. 



Max. delay 


a = 0.5% 


a = l% 


a = 5% 


100 ms 


99.35% 


99.50% 


99.91% 


200 ms 


99.60% 


99.82% 


99.95% 



